{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this example, we cluster our alanine dipeptide trajectory using the [RMSD distance metric](http://en.wikipedia.org/wiki/Root-mean-square_deviation_of_atomic_positions) and [hierarchical clustering](https://en.wikipedia.org/wiki/Hierarchical_clustering)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "from __future__ import print_function\n",
    "%matplotlib inline\n",
    "import mdtraj as md\n",
    "import numpy as np\n",
    "import matplotlib.pyplot as plt\n",
    "import scipy.cluster.hierarchy\n",
    "from scipy.spatial.distance import squareform"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's load up our trajectory. This is the trajectory that we generated in the \"Running a simulation in OpenMM and analyzing the results with mdtraj\" example. The first step is to build the rmsd cache, which precalculates some values for the RMSD computation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "traj = md.load('ala2.h5')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets compute all pairwise rmsds between conformations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "distances = np.empty((traj.n_frames, traj.n_frames))\n",
    "for i in range(traj.n_frames):\n",
    "    distances[i] = md.rmsd(traj, traj, i)\n",
    "print('Max pairwise rmsd: %f nm' % np.max(distances))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "`scipy.cluster` implements the average linkage algorithm (among others)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "# Clustering only accepts reduced form. Squareform's checks are too stringent\n",
    "assert np.all(distances - distances.T < 1e-6)\n",
    "reduced_distances = squareform(distances, checks=False)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "linkage = scipy.cluster.hierarchy.linkage(reduced_distances, method='average')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Lets plot the resulting dendrogram."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": false
   },
   "outputs": [],
   "source": [
    "plt.title('RMSD Average linkage hierarchical clustering')\n",
    "_ = scipy.cluster.hierarchy.dendrogram(linkage, no_labels=True, count_sort='descendent')"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.1"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 0
}
